Courses take place from 9.15-12 and from 13.15-16 and in room xxx if not otherwise mentioned
For group work
Thursday 10th:
Friday 11th:
Here you find a list of class members and contact information and groups.
Dear students,
A warm welcome to the module Data skills for social work professionals!
I would like to give you some important information on the course:
Physical presence in the course is not mandatory except for the last day (Jan 17), but I strongly urge you to participate in the course during the first four days. For those who do work part-time: please schedule your work accordingly.
I it is imperative that you have first experiences with R and RStudio and make sure it runs on your computer. Please follow the instructions in the “Installation of R and R-Studio” guide (https://drive.switch.ch/index.php/s/ktNsnWxwkJ3olWG), and if necessary, refer to the linked instructions on YouTube. If you have any questions, please feel free to contact us via email. Please use Copilot with the prompt below to guide you through the installation, explain the software to you in easy language and show you what you can do with it.
Enroll on the moodle page (Kurs: Data skills for social work professionals (in English) - HS24 | BFH Moodle – Die Lernplattform der Berner Fachhochschule) with the following key: HS24-bsc. At least a week before the course you will find a link to the script of the course as well as the relevant literature that you need to prepare and other relevant information. We wish you a successful preparation period and look forward to meeting you in person soon. Please let us know should you have any questions.
Kind regards
Dorian Kessler
Text to enter into Co-Pilot ein (Microsoft Copilot in Bing; important: verwenden Sie den Unterhaltungsstil «im höheren Masse kreativ/creative mode» (Schaltfläche in der Mitte des Bildschirms)): Als Studierende(r) der Sozialen Arbeit möchte ich die Grundlagen der Programmiersprache R lernen, um statistische Datenanalysen für Projekte in der Sozialen Arbeit durchführen zu können. Ich habe keine Vorkenntnisse in Statistik oder Programmierung. Kannst du mir bitte eine schrittweise Einführung geben? Bitte beginne mit der Frage ob ich R und Rstudio installiert habe und wenn nein, unterstütze mich bei der Installation von R und RStudio. Zeige mir dann die grundlegenden Befehle und Funktionen von R. Ich würde ich gerne lernen, wie man einfache Datenanalysen durchführt (z.B. Mittelwertsvergleiche mit dplyr), Daten visualisiert (mit ggplot2) und Ergebnisse interpretiert. Folgende Dinge sind zu beachten:
Wähle ein schrittweises Vorgehen. Erzähle mir erst von dem nächsten Schritt, wenn ein Schritt abgeschlossen ist. Frage nach jedem Schritt nach, ob ich diesen erfolgreich abschliessen konnte, um sicherzustellen, dass ich alles richtig gemacht habe.
Sage mir als ersten Schritt genau wie ich mich visuell in RStudio orientieren kann und wo ich Eingaben machen muss. Wo befindet sich die Konsole/Skript/Datenübersicht/Dateienübersicht in RStudio?
Erkläre mir, was die Konsole ist und was ein R-Skript ist, wie man ein R-Skript erstellt und abspeichert und was der Zweck von Skripten ist. Arbeite mit mir mit einem R-Skript und sage mir, wie ich Befehle ausführen kann.
Bitte führe mich durch praktische Übungen und gebe mir Aufgaben, um das Gelernte zu festigen.
Biete mir Unterstützung bei Unklarheiten.
Arbeite mit Beispielen, welche für die Soziale Arbeit relevant sind. Erfinde relevante Daten aus den Bereichen Sozialhilfe oder Kindes- und Erwachsenenschutz.
Kommentiere den Code Zeile-für-Zeile detailliert aus, so dass ich ihn genau verstehe.
Biete mir am Schluss weitere Übungen an, falls ich Lust habe. Mache Vorschläge für Übungen.
Du bist eine R-Expert:in, weisst aber auch, dass angehende Sozialarbeiter:in in Sachen Programmierung wenig Wissen haben und das nicht technische Begriffe eine alltagssprachliche Erklärung benötigen.
Danke für deine motivierte Unterstützung und Hilfsbereitschaft! Du hilfst mir R zu lernen und dieses Wissen für Klient:innen einzusetzen.
Wichtige Details:
Bitte lasse das «print()» weg, falls nicht nötig.
Ergänze bei Strg jeweils Ctrl, falls gewisse Personen englische Windows Tastaturen haben.
People gain awareness of data science tools and how they could be used for social work.
People learn how to critically evaluate data science products
People learn how to do data science with R.
Term that emerged ca. 10 years ago. Predecessors: Statistics, Data analysis.
The science of creating valuable information from data
Practice-oriented science
Combines technical and field expertise
Data contains information on human behavior = helps us better understand the human world and solve human problems.
In the era of AI, “data literacy” becomes a key skill in all areas of life, including social work –> it should be a basic competence
Skills to interpret data
Awareness of data and knowing how to use them
Skills to analyze data
You are ChatGPT, and your task is to help me develop a practical example of how data science could be applied in social work. The goal should be an example that is highly useful. Use the file from Dorian Kessler on potential use cases as a reference. Guide me through targeted questions to understand my work context or area of interest and suggest the most relevant application.
Conversation steps:
Understand the context:
Ask me:
“Are you currently working in social work? If not, what area interests you most?”
“Who are the clients or groups you work with or aim to work with?”
“What are common tasks in this field?”
“What are the three most pressing problems in your field?”
“What data is currently available or could be collected to improve workflows?”
Suggest solutions:
Based on my answers and Dorian Kessler’s file, propose 1–2 realistic examples of how data science could address challenges or improve processes. Briefly explain the benefits.
Get feedback and refine:
Ask:
Refine the example with my input and help me select the best option to share with my peers.
You will analyze one of the following data sets and research questions
Form: presentation on final day
R is free and open source.
R has an array of powerful statistical methods.
All additional tools can freely downloaded, installed and loaded as so called packages.
With ggplot2 R allows you to create beautiful
figures.
With the tidyverse and dplyr, R has the
simplest language for data preparation.
R is more than just statistical software
(cf. shiny).
R is well known by ChatGPT.
# Install required packages if they are not already installed
required_packages <- c("readxl", "dplyr", "tidyr", "ggplot2", "officer", "flextable")
installed_packages <- installed.packages()
for(pkg in required_packages){
if(!(pkg %in% rownames(installed_packages))){
install.packages(pkg)
}
}
# Load the packages
library(readxl)
library(tidyverse)
library(ggplot2)
library(officer)
library(flextable)
# Set the working directory
setwd("C:/Users/kld1/Downloads/")
# 1. Download Excel file
# https://www.pxweb.bfs.admin.ch/pxweb/de/px-x-1304030000_134/-/px-x-1304030000_134.px/table/tableViewLayout2/
url <- "https://www.pxweb.bfs.admin.ch/sq/ecfd5274-e21f-4d26-9bcf-5326af3edc9a"
destfile <- "sozialhilfe.xlsx"
download.file(url, destfile, mode = "wb")
# 2. Read and process data
# Read the Excel sheet (if multiple sheets exist, choose the correct one)
# Assuming the data is in the first sheet
raw_data <- read_excel(destfile, sheet = 1, skip = 2) # Skip the first 2 rows containing metadata
# Process the data: Select columns, rename, filter rows
data <- raw_data %>%
select(Kanton='...2', contains("20")) %>%
filter(!is.na(Kanton), Kanton %in% c("Bern / Berne", "Zürich", "Basel-Stadt", "Genève"))
# Transform the data from wide to long format
long_data <- data %>%
pivot_longer(
cols = `2009`:`2022`,
names_to = "Year",
values_to = "Count"
) %>%
mutate(Year = as.integer(Year),
Count = as.numeric(Count))
# 3. Create a plot with ggplot2
# Create a nice ggplot graphic
plot <- ggplot(long_data, aes(x = Year, y = Count, color = Kanton)) +
geom_line(size = 1) +
theme_minimal() +
labs(
title = "Number of Social Assistance Recipients per Canton (2009-2022)",
x = "Year",
y = "Number of Recipients",
color = "Canton"
) +
theme(
plot.title = element_text(hjust = 0.5, size = 16, face = "bold"),
axis.text = element_text(size = 10),
axis.title = element_text(size = 12),
legend.title = element_text(size = 12),
legend.text = element_text(size = 10)
)
# Save the plot as an image to insert into Word
ggsave("sozialhilfe_plot.png", plot = plot, width = 12, height = 8, dpi = 300)
# 4. Insert the plot into a Word document
# Create a new Word document
doc <- read_docx()
# Add a title
doc <- doc %>%
body_add_par("Number of Social Assistance Recipients per Canton (2009-2022)", style = "heading 1")
# Add the plot
doc <- doc %>%
body_add_img(src = "sozialhilfe_plot.png", width = 6, height = 4, style = "centered")
# Optional: Add a table with the data
# Create an example table (here the first 10 rows)
table_data <- long_data %>%
filter(Kanton %in% c("Bern / Berne", "Zürich", "Basel-Stadt", "Genève"))
ft <- flextable(table_data) %>%
# Automatically adjust column widths to fit content
autofit() %>%
# Set table width to 100% of the document width
width(j = 1:3, width = 1.5) %>% # Adjust individual column widths if necessary
set_table_properties(width = 1, layout = "autofit") %>%
# Optional: Enhance table aesthetics
theme_box() %>%
fontsize(size = 10, part = "all") %>%
bold(part = "header") # Bold the header row
# Add the table
doc <- doc %>%
body_add_par("Example Table of the Data", style = "heading 2") %>%
body_add_flextable(ft)
# Save the Word document
print(doc, target = "Social_Assistance_Report.docx")
RStudio Environment
ChatGPT and other frontier Large Language Models know R pretty well (Copilot also works, but the newest models are more able)
After you ask a question, tell ChatGPT how your data look like.
If you have no sensitive data, just paste the data in to show ChatGPT the structure. If you have sensitive data, just paste the header (= variable names). With Copilot, data protection issues are smaller.
Paste the resulting code back into the R-Script and run the code
If you have errors, paste the error (from the console) back into ChatGPT and tell it to solve the problem.
Tell ChatGPT to only give you relevant code, if you adapt parts your overall code.
If it doesn’t comment code, ask to comment and explain what each piece of code does.
Open a new R script, copy the above code into it and save it
Ask it to assist you while giving helpful and targeted advice, i.e. that it should tell you how to change the code. Try the following tasks:
R allows you to read in data in all formats, including directly
from the internet (see rvest).
The most common data storage format are Excel tables. You can
open them with the readxl package.
The most universal data storage format is csv (comma separated values).
The best way to deal with large data are the
data.table (to read in large csv-data-files) and the
arrow packages (to save and read in large data).
#Set the working directory. Here we use the download folder
setwd("C:/Users/kld1/Downloads/")
#Download data to the folder by hand
#Büro: https://drive.switch.ch/index.php/s/gdNYHopxWDCV9hr
#Turnhalle: https://drive.switch.ch/index.php/s/am1T36ehPL24QuQ
#Install and load excel package
install.packages("readxl")
library(readxl)
#Read in data from the working directory
Buero <- read.excel("OJAOffice_Statistikdaten_Jugendbüro Oberburg 23.xlsx",sheet="Statistikdaten 2024")
Turnhalle<- read.excel("OJAOffice_Statistikdaten_offene Turnhalle 24.xlsx",sheet="Statistikdaten 2024")
#Explain objects, observations and variables
#Explain range und col_names = FALSE
RStudio allows you to manually scroll through data
This helps you better understand what is going on
#Explain what rows (observations) and columns (variables) are.
#You can either click on the object or...
#use View()
View(Buero)
View(Turnhalle)
#Or even fix data (never do this!)
fix(Buero)
| Object name data should be saved with | Year | Sheet to read in and additional restrictions | Source Link |
|---|
Maedels_22 |
2022 | Statistikdaten 2024 | OJAOffice_Statistikdaten_Moditrff.xlsx |
Maedels_23 |
2023 | Moditräff | Statistik OJA Angebote Burgdorf 2023.xlsx |
Jungs_22 |
2022 | Gieleträff | OJAOffice_Statistikdaten_Gieltrff.xlsx |
Jungs_23 |
2023 | Gieleträff, range=“A11:B11”,col_names = FALSE | Statistik OJA Angebote Burgdorf 2023.xlsx |
JuBu_23 |
2023 | JuBU Träff 5&6 | Statistik OJA Angebote Burgdorf 2023.xlsx |
JuBu_24 |
2024 | Statistikdaten 2024 | Copy of OJAOffice_Statistikdaten_Mittelstufentreff 24.xlsx |
filter(): selects a subset of rows (see also
slice())arrange(): sortsselect(): selects columnsmutate(): creates new columnssummarize(): aggregates (collapses) data to individual
data pointsdistinct(): removes duplicate valuesgroup_by(): defines subgroups in the data so that
mutate() and summarize() can be applied
separately per group.%>%,
which makes the code much easier to read and more compact.#install package
#install.packages("dplyr")
#load package
library(dplyr)
setwd("C:/Users/kld1/switchdrive/BFH/Wichtige Dokumente/Lehre/Data skills for social work professionals/BFH/Daten/Fokus Arbeit/")
focarb <- read.csv("FokusArbeit_Wirkung.csv")
#Select: select the variables Vitality1 (=measurement of vitality before the intervention), Vitality2 (=measurement of vitality before the intervention) and intervention (did the person participate in Fokus Arbeit or recieve standard counseling)
focarb <- focarb %>%
select(Vitality1,Vitality2,Interventionsgruppe)%>%
#Filter: out observations that have a missing value (NA = not available) on the measurement before the intervention. Use logical operators to set the filter condition. Logical operators:
# is missing: is.na(),
# bigger than: >,
# smaller than: <,
# equals: ==,
# not equal to: !=,
# not: !,
# or: |,
# is element of: %in%,
# is infinite: is.inf()).
filter(!(is.na(Vitality1)))%>%
#Mutate: calculate a new variable that measures the change in vitality before versus after
mutate(Change_Vitality=Vitality2-Vitality1)
#Plot the distribution of the change for the two groups
ggplot(focarb,aes(x=Change_Vitality,
fill=factor(Interventionsgruppe)))+
geom_density(alpha=.5)
Goal: find out the share of social workers among the working population in Europe and in Switzerland.
Read in data from the 11 rounds of the European Social Survey with the following steps:
Use the password to download the data
Relocate the working directory (using setwd()) to
your download folder or move the data to your working directory
Make sure you have the arrow package installed
(otherwise use install.packages("arrow")).
Read in the data with the read_parquet()
command.
Save it into the object ess.
Find out which variables measure the country of origin and the occupation of the respondent using the variable list. Hint 1: start searching from the top. Hint 2: Occuption is measured with the ISCO08 classification. For earlier years, it is the ISCO88 classification, but the data is reduced to years with the isco08 classification (after 2010). Use the variable list to find the exact variable names and labels.
Reduce the data frame to those two variables: country, ISCO08.
Filter out observations of individuals where information on ISCO08 is missing. The following values should be excluded:
| 66666 | Not applicable* |
| 77777 | Refusal* |
| 88888 | Don’t know* |
| 99999 | No answer* |
Create a new variable socialworker that measures
whether someone is a social worker or not (click on the variable to know
which numbers stand for social workers). Use the ifelse()
function to define the variable.
Calculate the share of social workers in the whole data set using
prop.table(table()). How many social workers per 100
working people are there in Switzerland?
Reduce the data frame to people from Switzerland and repeat. Are there more social workers per 100 people in Switzerland than in total Europe?
Save graphs as png and link them into word
Save tables as docx and link them into word